A Text Transformation Scheme for Degenerate Strings

نویسندگان

  • Jacqueline W. Daykin
  • Bruce Watson
چکیده

The Burrows-Wheeler Transformation computes a permutation of a string of letters over an alphabet, and is well-suited to compression-related applications due to its invertability and data clustering properties. For space e ciency the input to the transform can be preprocessed into Lyndon factors. We consider scenarios with uncertainty regarding the data: a position in an indeterminate or degenerate string is a set of letters. We first define Indeterminate Lyndon Words and establish their associated unique string factorization; we then introduce the novel Degenerate Burrows-Wheeler Transformation which may apply the indeterminate Lyndon factorization. A core computation in Burrows-Wheeler type transforms is the linear sorting of all conjugates of the input string we achieve this in the degenerate case by applying lex-extension ordering. Indeterminate Lyndon factorization, and the degenerate transform and its inverse, can all be computed in linear time and space with respect to total input size of degenerate strings.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Pattern Matching in Elastic-Degenerate Strings

In this paper, we extend the notion of gapped strings to elastic-degenerate strings. An elastic-degenerate string can been seen as an ordered collection of k > 1 seeds (substrings/subpatterns) interleaved by elastic-degenerate symbols such that each elastic-degenerate symbol corresponds to a set of two or more variable length strings. Here, we present an algorithm for solving the pattern matchi...

متن کامل

Convergence of the Euler–Maruyama method for multidimensional SDEs with discontinuous drift and degenerate diffusion coefficient

We prove strong convergence of order [Formula: see text] for arbitrarily small [Formula: see text] of the Euler-Maruyama method for multidimensional stochastic differential equations (SDEs) with discontinuous drift and degenerate diffusion coefficient. The proof is based on estimating the difference between the Euler-Maruyama scheme and another numerical method, which is constructed by applying...

متن کامل

Degenerate String Reconstruction from Cover Arrays

Regularities in degenerate strings have recently been a matter of interest because of their use in the fields of molecular biology, musical text analysis, cryptanalysis and so on. In this paper, we study the problem of reconstructing a degenerate string from a cover array. We present two efficient algorithms to reconstruct a degenerate string from a valid cover array one using an unbounded alph...

متن کامل

A Fast and Accurate Global Maximum Power Point Tracking Method for Solar Strings under Partial Shading Conditions

This paper presents a model-based approach for the global maximum power point (GMPP) tracking of solar strings under partial shading conditions. In the proposed method, the GMPP voltage is estimated without any need to solve numerically the implicit and nonlinear equations of the photovoltaic (PV) string model. In contrast to the existing methods in which first the locations of all the local pe...

متن کامل

The Poincaré coset models ISO(d-1,1)/IR and T-duality

We generalize a family of Lagrangians with values in the Poincaré group ISO(d − 1, 1), which contain the description of spinning strings in flat (d− 1) + 1 dimensions, by including symmetric terms in the world-sheet coordinates. Then, by promoting a subgroup H ∼ IR, n ≤ d, which acts invariantly from the left on the element of ISO(d − 1, 1), to a gauge symmetry of the action, we obtain a family...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014